Skip to content

Make WorkspaceClient.dbutils lazy via cached_property#1470

Merged
Divyansh-db merged 1 commit into
mainfrom
generate-cached-entry
Jun 11, 2026
Merged

Make WorkspaceClient.dbutils lazy via cached_property#1470
Divyansh-db merged 1 commit into
mainfrom
generate-cached-entry

Conversation

@Divyansh-db

@Divyansh-db Divyansh-db commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Makes WorkspaceClient.dbutils a functools.cached_property so consumers that never read it pay no construction cost — and, on Spark Connect runtimes, never touch the legacy SparkContext path that databricks.sdk.runtime materializes on import. Includes four regression tests that lock in the contract.

Why

WorkspaceClient.__init__ used to call _make_dbutils(self._config) eagerly, which on a cluster imports databricks.sdk.runtime. On a Spark Connect (shared-access-mode) cluster, that import materializes a legacy SparkContext and raises CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT, crashing the constructor before any API call. Downstream consumers that never touch .dbutils (notably dbt-databricks Python models) hit this for no reason — see #1463 and databricks/dbt-databricks#1252.

#1469 patches the runtime side as a defense-in-depth fallback (catch the materialization failure, fall back to RemoteDbUtils). This PR is the durable fix: callers that don't read .dbutils never trigger the build at all, sidestepping the entire code path. The first read still calls _make_dbutils once, lazily; subsequent reads hit the cached attribute in __dict__ at plain-attribute speed.

What changed

databricks/sdk/__init__.py (generated from updated template):

  • from functools import cached_property added to the imports.
  • The eager self._dbutils = _make_dbutils(self._config) line is removed from __init__.
  • @property def dbutils (which returned the cached self._dbutils) becomes @cached_property def dbutils that calls _make_dbutils(self._config) on first access.

_dbutils was a private attribute with no external consumers (verified across the codebase), so removing it does not break any public surface.

tests/test_client.py — four new tests:

  • test_dbutils_is_a_cached_property — descriptor type check.
  • test_workspace_client_init_does_not_build_dbutils — spies _make_dbutils, constructs a WorkspaceClient, asserts the spy was never called.
  • test_dbutils_first_access_builds_exactly_once — first read invokes _make_dbutils once (returns the spy's sentinel); second read still shows call_count == 1 and same identity.
  • test_workspace_client_constructs_on_spark_connect_without_touching_runtime — fakes dbruntime to raise CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT on any namespace materialization; asserts WorkspaceClient(config=...) succeeds and databricks.sdk.runtime is never imported during construction. This is the strongest evidence that the dbt-databricks failure mode is sidestepped by this change alone.

How is this tested?

  • 4/4 new tests pass locally (0.03s).
  • Existing tests/test_client.py autospec tests untouched, still pass.
  • The fourth test is the negative-space proof: asserts databricks.sdk.runtime is not in sys.modules after WorkspaceClient(config=...) — i.e., the constructor literally does not reach for the runtime module.

NO_CHANGELOG=true

Regenerated ``databricks/sdk/__init__.py`` with the updated template
(imports ``functools.cached_property``, drops the eager
``self._dbutils = _make_dbutils(self._config)`` from ``__init__``,
emits ``dbutils`` as a ``@cached_property`` that calls
``_make_dbutils`` on first access).

Adds four ``tests/test_client.py`` tests that lock in the contract:

- ``dbutils`` is a ``functools.cached_property`` descriptor on
  ``WorkspaceClient``.
- ``WorkspaceClient.__init__`` does not invoke ``_make_dbutils``.
- The first ``ws.dbutils`` read invokes ``_make_dbutils`` once;
  subsequent reads return the cached value without re-invoking.
- Constructing ``WorkspaceClient`` on a faked Spark Connect runtime
  (whose ``dbruntime`` raises ``CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT``
  on any namespace materialization) succeeds without importing
  ``databricks.sdk.runtime`` at all — the durable sidestep of
  databricks/dbt-databricks#1252.

Complements #1469 (which catches the same failure at runtime-module
import time as a defense-in-depth fallback).
@Divyansh-db Divyansh-db force-pushed the generate-cached-entry branch from 96b01a9 to b7a21f1 Compare June 9, 2026 18:21
@Divyansh-db Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 18:21 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1470
  • Commit SHA: b7a21f18d2b54cdc87b4d044bb170511aea4c1d1

Checks will be approved automatically on success.

@Divyansh-db Divyansh-db changed the title tests: validate WorkspaceClient.dbutils lazy-property behavior Make WorkspaceClient.dbutils lazy via cached_property Jun 9, 2026
@Divyansh-db Divyansh-db marked this pull request as ready for review June 9, 2026 18:21
@Divyansh-db Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 18:21 — with GitHub Actions Inactive
@Divyansh-db Divyansh-db requested a review from hectorcast-db June 9, 2026 18:35
@Divyansh-db Divyansh-db added this pull request to the merge queue Jun 11, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 11, 2026
@Divyansh-db Divyansh-db added this pull request to the merge queue Jun 11, 2026
Merged via the queue into main with commit d01f89a Jun 11, 2026
14 of 16 checks passed
@Divyansh-db Divyansh-db deleted the generate-cached-entry branch June 11, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants